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Description instruction can be shortened. 

[0006] The US-A-3 566 366 discloses a stored pro- 
BACKGROUND OF THE INVENTION: gram computer in which both half word length and full 

word length instructions are employed. A circuit ar- 
Field of the Invention: 5 rangement selectively omits execution of no-operation 

half word length instructions. The second instruction of 
[ 0001 ] The present invention relates to a microproc- a pair of half word length instructions is decoded during 

essor which is capable of processing a variable-length execution of the first instruction of the pair and an output 

instruction set and which decodes a plurality of instruc- signal is generated when the second instruction is a no- 
tions in parallel. io operation instruction. 

[0007] In the EP-A-0 354 740, a data processing ap- 
Description of the Prior Art: paratus for decoding and executing instructions in a par- 

allel manner in a variable word length instruction format 
[0002] In any prior-art microprocessor capable of is disclosed. A plurality of decoders is used in which, 

processing a variable-length instruction set, the parallel 15 while the primary instruction decoder is decoding an in- 
decode of instructions is not performed. struction, the probability of parallel decoding of the next 

[0003] As a known example pertinent to the present instruction is detected, so that the primary instruction 

invention, there is mentioned an instruction decoding decoder and a secondary instruction decoder decode a 
method stated in a treatise "32-bit microprocessor V80 variable word length instruction and a fixed word length 
wherein the disturbance of a pipeline is suppressed by 20 instruction, respectively, in a parallel manner, 
building in a cache and a branch prediction mechansim, [0008] The prior art known from the EP-A-0 354 740 

etc., thereby to enhance a performance" contained on mentioned above forms the basis for the preamble of 

pp. 1 95 - 206 in NIKKEI ELECTRONICS BOOKS "New- attached claim 1 . 

Generation Microprocessors RISC, CISC, TRON" pub- 
lished on September 11, 1989. 25 SUMMARY OF THE INVENTION: 

[0004] In the known microprocessor, a plurality of in- 
structions are not really decoded in parallel, but an in- [0009] The inventors' study, however, has revealed 
struction is decoded in two stages, thereby to enhance that, with the prior art technique disclosed in the afore- 
the throughput of decode capability. The first-stage de- mentioned treatises, two problems are involved in case 
code circuit of this known microprocessor is called a pre- 00 of raising the speed of the processing of the microproc- 
decode unit, which has the function of decomposing a essor still more. 

variable-length instruction into elements of fixed length. [0010] The first problem is that, since the instruction 

The instruction decomposed into the fixed-length ele- decode uses two stages in the pipeline, branch process- 
ments in this manner is once stored in a buffer (register) ing slows down to the corresponding extent, 
within the pre-decode unit, and it is transferred from the 35 [0011] That is, in a case where the branch processing 

pre-decode unit to an instruction decode unit in compli- is started and is followed by fetching and pre-decoding 

ance with the request of the instruction decode unit. a branch destination instruction, atime period expended 

[0005] Meanwhile, the official gazette of Japanese on the branch increases to the amount of one stage 

Patent Application Laid-open No. 244233/1 988 disclos- more than in a microprocessor which requires only one 

es a microprocessor which is intended to shorten the 40 stage for decode processing. 

decode time period of a variable-length machine Ian- [0012] As the second problem, in a case where the 
guage by decoding a plurality of unit instructions in par- method of decoding an instruction in two stages as in 
allel. With the microprocessor, the machine language in- the prior-art technique is adopted in a microprocessor 
structions of 2 bytes are accepted from outside each which executes a plurality of instructions in parallel, the 
time, and the unit instruction of the first byte and that of 45 pre-decode unit governs the performance of the whole 
the second byte are respectively decoded by a first de- microprocessor. The reason is that, since the instruction 

coder and a second decoder. A first selector selects one to be processed is in the state of a variable length, the 

decoded result from among a plurality of decoded re- succeeding instruction cannot be pre-decoded unless 
suits delivered from the first decoder. A second selector the pre-decode unit pre-decode the preceding instruc- 
selects one decoded result from among a plurality of de- 50 tion. That is, the pre-decode unit can pre-decode only 
coded results delivered from the second decoder, in ac- one instruction at a time. 

cordance with the decode information delivered from the [0013] It has also been revealed by the inventors' 
first selector. The select operation of the first selector is study that there are three solving methods for the sec- 

determined in accordance with the decode information ond problem. 

delivered from the second selector. According to the mi- 55 [0014] The first solution is that a plurality of pre-de- 

croprocessor thus constructed, the machine language code circuits for pre-decoding a plurality of instructions 
instructions of 2 bytes can be decoded in one machine are connected in series. Herein, the succeeding pre-de- 
cycle, and the decode time period of the variable-length code circuit refers to the output of the preceding pre- 


2 



3 


EP 0 467 152 B1 


4 


decode circuit. Moreover, the plurality of pre-decode cir- 
cuits are designed so as to be operable within one cycle. 
Then, the problem can be solved. In this case, however, 
the delay time of the pre-decode circuits connected in 
series becomes problematic. 

[001 5] The second solution is the method in which the 
pre-decode unit is endowed with a performance capable 
of pre-decoding one instruction in one cycle, whereupon 
the difference between the processing performances of 
the pre-decoder and the instruction decoder is absorbed 
by a buffer arranged between them. Since, however, the 
maximum throughput becomes one instruction/cycle 
with this method, the performance of the microproces- 
sor is not considerably enhanced in spite of the fact that 
the microprocessor is specially permitted to execute the 
plurality of instructions at the other stage. 

[0016] The third solution is that, as in the present in- 
vention, the succeeding instruction is decoded by plac- 
ing any assumption on the format of the preceding in- 
struction. 

[0017] The present invention has been made in prac- 
ticably realizing the third solution, and has for its object 
to provide a microprocessor which can decode a plural- 
ity of instructions at high speed and in parallel in case 
of processing a variable-length instruction set. 

[0018] Meanwhile, regarding the prior-art technique 
disclosed in the aforementioned official gazette of Jap- 
anese Patent Application Laid-open, the inventors' 
study has revealed the following disadvantage: In order 
to decode all the patterns of the unit instructions, a large 
number of instructions need to be decoded in parallel in 
the first and second instruction decoders, and one de- 
coded result need to be selected from among a large 
number of decoded results by the selectors. Therefore, 
the hardware quantities of the instruction decoders and 
the selectors become enormous. 

[0019] It is accordingly another object of the present 
invention to provide a microprocessor which can decode 
a plurality of instructions at high speed and in parallel 
while restraining the quantity of its hardware to the min- 
imum. 

[0020] In order to accomplish the objects, according 
to the present invention, an instruction is decoded under 
the assumption of the instruction length thereof. 

[0021] Subsequently, when the assumption has been 
found correct by the decode of the instruction, the de- 
coded result of a succeeding instruction is also judged 
correct. To the contrary, when the assumption has been 
found erroneous, the decoded result of the succeeding 
instruction is judged erroneous, and it is invalidated. 
[0022] Further, the assumptive instruction length 
should desirably be the length of the shortest instruction 
format in an instruction set. The reason is that the in- 
struction format which is the shortest in the variable- 
length instruction set corresponds to instructions of high 
frequence in use, so the assumption holds good at a 
high probability. 

[0023] Besides, in order to decode a plurality of in- 


structions in parallel, an instruction prefetch unit trans- 
fers an instruction code whose length is at least double 
the shortest instruction format, to an instruction decode 
unit. 

5 [0024] In the instruction decode unit, the instruction 

code is input to individual instruction decoders every 
length of the shortest instruction format. Each of the in- 
struction decoders is capable of decoding, at least, the 
instructions having the shortest instruction format, and 
io at least one of the instruction decoders is capable of de- 
coding all the instructions of the instruction set. It is also 
possible to hold the outputs of the respective instruction 
decoders in output latches different from one another. 
[0025] A microprocessor according to a typical em- 
15 bodiment of the present invention is outlined as follows: 
[0026] The microprocessor is characterized by com- 
prising: 

(1) a fetch unit (IU) which fetches first and second 
20 instructions each having an instruction length of 

predetermined bit width (1 6 bits), from outside said 
microprocessor, and which delivers the first and 
second instructions to output lines in parallel, said 
output lines having a bit width (32 bits) that is at least 
25 double the predetermined width; 

(2) a first instruction decoder (IDO) whose input is 
supplied with the first instruction on said output lines 
of said fetch unit (IU); 

(3) a second instruction decoder (ID1) whose input 
30 is supplied with the second instruction on said out- 
put lines of said fetch unit (IU); 

(4) a control unit (PONT) which is supplied with a 
decoded result (id0_out) of said first instruction de- 
coder and that (id1_out) of said second instruction 

35 decoder; and 

(5) an instruction execution unit (EU) which re- 
sponds to an output from said control unit (PCNT); 

wherein under a condition under which the first in- 
40 struction of the predetermined instruction length is de- 
livered from said output lines having the bit width that is 
at least double the predetermined width, said control 
unit (PCNT) responds to information on fulfillment of the 
condition in the decoded result (id0_out) of said first in- 
45 struction decoder (IDO) and validates the decoded result 
(id1_out) of said second instruction decoder (ID1), so 
that said instruction execution unit (EU) executes the 
first instruction and the second instruction in parallel in 
response to the decoded results (id0_out, id1_out) of 
50 said first and second instruction decoders transmitted 
as the output of said control unit; 

whereas under another condition under which an 
instruction having an instruction length differentfrom the 
predetermined bit width is delivered from said output 
55 lines of said fetch unit (IU), said control unit (PCNT) re- 
sponds to information on fulfillment of the other condi- 
tion in the decoded result (id0_out) of said first decoder 
(IDO) and invalidates the decoded result (id1_out) of 
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said second decoder (ID1) ; so that said instruction ex- 
ecution unit (EU) executes the first instruction in re- 
sponse to the decoded result (idO_out) of said first in- 
struction decoder (IDO) transmitted as the output of said 
control unit (PCNT). 5 

[ 0027 ] It is decided whether or not the instruction 
codes processed by the instruction decoders corre- 
spond to the instructions which can be decoded by the 
respective instruction decoders (that is, the instructions 
which have the shortest instruction format). In a case io 
where, as the result of the decision, any of the instruction 
decoders has decoded the instruction having any differ- 
ent instruction format, the decoded results of the instruc- 
tion codes succeeding the particular instruction are all 
invalidated. The invalidation is readily realized using a 15 
control circuit. To the contrary, in a case where, as the 
result of the decision, all the instruction decoders have 
decoded the instructions having the shortest instruction 
format, all the decoded results are valid. On this occa- 
sion, the throughput of the instruction decode is the 20 
maximum, and the instructions equal in number to the 
instruction decoders are processed in one cycle. 

[ 0028 ] Thus, the maximum throughput of the instruc- 
tion decode can be rendered two or more instructions/ 
cycle though subject to the cases of the correct assump- 25 
tion, and the second problem stated before can be 
solved. Moreover, since the instruction length is as- 
sumed, the variable-length instruction need not be de- 
composed into the fixed-length elements by the pre-de- 
code circuit, and the first problem stated before can be 20 
solved. 

[ 0029 ] In addition, according to the present invention, 
the second instruction decoder executes significant de- 
code concerning only the instruction head code of the 
instruction (in other words, one sort of decode), and the 35 
insignificant decoded result of the second instruction de- 
coder is invalidated under any other condition (in other 
words, in case of a non-head code). Therefore, the plu- 
rality of instructions can be decoded at high speed and 
in parallel while the hardware quantity of the second in- 40 
struction decoder is restrained to the minimum. 

[ 0030 ] Unlike the pre-decoding method hitherto 
known, the instruction decoding method of the present 
invention decodes an instruction under an erroneous 
assumption in a certain case. In this case, the decoded 45 
result is invalidated as described above, and the 
throughput becomes one instruction/cycle. In this man- 
ner, the processing performance depends upon the in- 
struction format more in the method of the present in- 
vention than in the pre-decoding method. This point can 50 
be coped with in such a way that the instructions which 
have the format fulfilling the assumption are used to the 
utmost in a program. 

[ 0031 ] Other objects and features of the present in- 
vention will become apparent from the ensuing descrip- 55 
tion of embodiments taken in conjunction with the ac- 
companying drawings. 


BRIEF DESCRIPTION OF THE DRAWINGS: 

[ 0032 ] 

Fig. 1 shows a block diagram of a microprocessor 
which is an embodiment of the present invention; 
Fig. 2 shows the six sorts of instruction lengths of a 
variable-length instruction set which the microproc- 
essor of the embodiment has; 

Fig. 3 shows an example of the row of instructions 
in a memory as to the instruction set of the embod- 
iment; 

Figs. 4(A) and 4(B) show the values of signal lines 
iO - i5 in the case where the microprocessor shown 
in Fig. 1 executes the instruction row in Fig. 3, as to 
two certain points of time; 

Fig. 5 shows a detailed arrangement diagram of a 
control circuit PCNT which is one of the constituents 
of the microprocessor in Fig. 1 ; and 
Fig. 6(A) shows the changes of control signals 
which are generated by instruction decode in the 
case where the instruction row in Fig. 3 is executed 
by the microprocessor in Fig. 1, while Fig. 6(B) 
shows the changes of control signals in the case of 
employing an architecture in which the microproc- 
essor in Fig. 1 includes only one instruction decod- 
er. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS: 


[ 0033 ] Fig. 1 is a block diagram of a microprocessor 
to which the present invention is applied. The present 
invention makes it possible to decode a plurality of in- 
structions in parallel. Here will be described the internal 
architecture and operation of the microprocessor which 
decodes two instructions in parallel as the simplest as- 
pect of the parallel decode of the plurality of instructions. 


Internal Architecture of Microprocessor 


[ 0034 ] First, the internal architecture of the microproc- 
essor will be described with reference to Fig. 1 . The mi- 
croprocessor in Fig. 1 is basically constructed of an in- 
terface unit IOU , an instruction prefetch unit IU, an in- 
struction decode unit DU and an execution unit EU. 
These units are capable of parallel operations, and pipe- 
line processing is performed under the control of the in- 
struction decode unit DU. 


Interface Unit IOU 


[ 0035 ] The microprocessor is connected with external 
devices (for example, a main memory) through the in- 
terface unit IOU. This interface unit IOU transfers both 
instructions and data between the microprocessor and 
the main memory. 

[ 0036 ] More specifically, an instruction fetched from 
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the main memory is transferred from the interface unit and length of the immediate data or displacement data 

IOU to the instruction prefetch unit IU through signal in any instruction are designated in the operation code 
lines having a width of 64 bits. of the instruction, and the data is obtained by decoding 

[ 0037 ] On the other hand, data computed by the ex- the operation code. The expansion part generator EG 
ecution unit EU is transferred from this execution unit 5 processes the data on the basis of the designation, and 
EU to the interface unit IOU through signal lines in two delivers the processed data to a bus do or dl . The rea- 
sets each consisting of 32 bits, while data fetched from son why the expansion part generator EG has two sets 

the main memory is transferred from the interface unit of 32-bit output lines, is that the data items are trans- 

IOU to the instruction decode unit DU. ferred independently under the respective controls of 

io the first instruction decoder IDO and the second instruc- 
Instruction Prefetch Unit I U tion decoder ID1 . 

[ 0038 ] The instruction prefetch unit IU has a prefetch Execution Unit EU 
queue PFQ. The instructions transferred from the inter- 
face unit IOU are once latched in the prefetch queue is [ 0042 ] Two integral arithmetic logic units ALU are sim- 
PFQ and aligned in 16-bit unit, whereupon the aligned ilarly disposed in the execution unit EU so as to corre- 

instructions are delivered to the instruction decode unit spond to thefirst instruction decoder I DO and the second 
DU. The prefetch queue PFQ is a queue of FIFO (First- instruction decoder ID1 , respectively. 

In First-Out). 

[ 0039 ] The instructions after the alignment are trans- 20 Register File RF 
ferred from the instruction prefetch unit IU to the instruc- 
tion decode unit DU through six sets of 1 6-bit signal lines [ 0043 ] A register file RF in the instruction decode unit 

iO - i5. Here, the signal line iO bears the head code of DU is configured of sixteen 32-bit registers ROthru R15. 

the instruction to be decoded in the next machine cycle, Each of the registers has four read ports and two write 

and the signal lines il - i5 bearthe row of the instructions 25 ports, totaling six ports. Among these ports, one half 
succeeding the instruction of the signal line iO. The sig- (two read ports and one write port) corresponds to the 

nal line iO is connected to a first instruction decoder IDO. side of thefirst instruction decoder IDO and is connected 
Similarly, the signal line il is connected to a second in- to thefirst arithmetic logic unit ALU0. Likewise, the ports 
struction decoder ID1 . It is the feature of the embodi- of the other half correspond to the side of the second 
mentofthe present invention that the input of the second 00 instruction decoder ID1 and are connected to the sec- 
instruction decoder ID1 is uniquely determined by the ond arithmetic logic unit ALU1 . 

signal of the signal line il and is not selected from 

among the signals of the signal lines il - i5. Besides, the Signal Lines of 32-bit Width 
first instruction decoder IDO has the function of decoding 

all instructions which can be processed by the micro- 35 [ 0044 ] The instruction decode unit DU andtheexecu- 

processor. In contrast, the second instruction decoder tion unit EU are connected by six sets of signal lines dO, 
ID1 can decode only instructions in an instruction format dl, d2, d3, eO and el each having a width of 32 bits, 
having a length of 1 6 bits or 32 bits, among the instruc- Among them, the four sets (dO, dl , d2, d3) are used for 

tions which the microprocessor can execute. The de- transferring data from the instruction decode unit DU to 

coded results of the instructions in the first instruction 40 the execution unit EU, while the remaining two sets (eO, 
decoder IDO and the second instruction decoder ID1 are el) are used fortransferring data from the execution unit 
respectively delivered to signal lines id0_out and EU to the instruction decode unit DU. 

id1_out and then sent to a pipeline control unit PC NT. [ 0045 ] By way of example, let's consider a case where 

the first arithmetic logic unit ALU0 processes the instruc- 
Pipeline Control Unit PCNT 45 tion of adding the values of the registers R0 and R1 and 

then setting the sum in the register R1 . In this case, the 
[ 0040 ] The pipeline control unit PCNT generates con- values of the registers R0 and R1 are first read out from 

trol signals for the units IOU, IU and EU on the basis of the register file RF and respectively delivered to the 

the signals of the signal lines id0_out and id1_out and 32-bitsignal lines dO and dl . At the next execution stage 

signals (not shown in the figure) indicating the statuses so to the instruction decode unit DU in the pipeline, that is, 
of these units IOU, IU and EU. in the execution unit EU, the first arithmetic logic unit 

ALU0 receives the values from the signal lines dO and 
Expansion Part Generator EG dl and adds them up. The result of the addition is de- 

livered to the signal lines eO. Further, at the stage of reg- 
[ 0041 ] In addition, the instruction decode unit DU in- 55 jster store which is the next processing, the processing 
eludes an expansion part generator EG, by which im- proceeds in the instruction decode unit DU again, and 

mediate data or displacement data in the instructions is the value on the signal lines eO is set in the register R1 

expanded to 32 bits and then delivered. The position within the register file RF. The above is an operation em- 
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ploying the side of the first instruction decoder IDO. In 
case of employing the side of the second instruction de- 
coder ID1 , the signal lines d2, d3ande1 and the second 
arithmetic logic unit ALU1 are used. More specifically, 
the values of the registers R0 and R1 are respectively 
delivered to the signal lines d2 and d3, and they are add- 
ed by the second arithmetic logic unit ALU1 . Thereafter, 
the result of the addition is transferred to the register R1 
by the use of the signal lines el . 

[ 0046 ] In the case of transferring data between the mi- 
croprocessor and the memory, signal lines in two sets 
each consisting of 32 bits as laid between the signal 
lines eO, el and the interface unit IOU are used. Since 
the operation of this part is not directly relevant to the 
present invention, it shall be omitted from description. 
[ 0047 ] The effect of the present invention is that the 
parallel decode of a plurality of instructions becomes 
possible. In this embodiment, the microprocessor hav- 
ing the instruction set of variable-length instructions will 
be taken as an example. Therefore, what the variable- 
length instruction is will be first explained briefly. 

Variable-length Instruction 

[ 0048 ] In short, the "variable-length instruction" 
means an instruction which has a plurality of instruction 
formats and whose length changes when the different 
instruction formats are taken. In other words, an instruc- 
tion set including any instruction of different length has 
the instruction of variable length. 

Fixed-length Instruction 

[ 0049 ] In contrast, a case where all instructions have 
a fixed length is generally called an "instruction set of 
fixed length". 

Instruction Set of This Embodiment 

[ 0050 ] As shown in Fig. 2, this embodiment assumes 
the set of instructions which have six sorts of lengths of 
16 bits thru 96 bits in 16-bit unit. In the memory, the in- 
structions are located bordering every 16 bits. That is, 
the 16-bit elements of the instructions are all located at 
addresses of even-numbered bytes. This situation is il- 
lustrated in Fig. 3. 

[ 0051 ] Next, the operation of the parallel decode of 
instructions in this embodiment will be described. 
[ 0052 ] Fig. 3 shows one example of the row of instruc- 
tions in the memory. The individual instructions are in- 
dicated as, for example, instO and instl . The instruction 
whose length exceeds 1 6 bits is indicated as, for exam- 
ple, inst2_0 and inst2_1 by further affixing lower bars 
and numerals. That is, the instruction longerthan 1 6 bits 
is divided into a plurality of elements. It is also assumed 
that a code which must be subjected to decode process- 
ing in each instruction is limited to the head code of the 
instruction. In other words, it is assumed that the non- 


head code of each instruction is immediate data or dis- 
placement data. In the case of the instruction inst2 by 
way of example, the first code inst2_0 needs to be de- 
coded, but the succeeding code inst2_1 need not be de- 
5 coded. 

[ 0053 ] Underthe above premises, Figs. 4(A) and 4(B) 
show the statuses of the 1 6-bit signal lines iO - i5 at two 
certain points of time, the signal lines constituting the 
transfer bus from the instruction prefetch unit IU to the 
instruction decode unit DU. Fig. 4(A) illustrates the sta- 
tuses in which the instruction row in Fig. 3 has already 
been accepted in the prefetch queue PFQ of the instruc- 
tion prefetch unit IU, and in which the first instruction 
instO is about to be decoded. In the first half of the next 
machine cycle, the first instruction instO is decoded by 
the first instruction decoder IDO, and the succeeding in- 
struction instl by the second instruction decoder ID1 . 
As the results of the decoding, it is found that the two 
instructions instO and instl are both in the instruction 
format having the shortest length. Then, a command is 
issued from the instruction decode unit DU to the in- 
struction prefetch unit IU so as to advance the pointer 
of instructions to the amount of 32 bits. In consequence, 
after a further half machine cycle, the signal lines iO - i5 
between the instruction prefetch unit IU and the instruc- 
tion decode unit DU fall into the statuses shown in Fig. 
4(B) in which the two instructions instO and instl have 
been taken away and in which the instructions inst5 and 
inst6 are added instead. On this occasion, the instruc- 
tion code inst2_0 is decoded by the first decoder IDO, 
and the instruction code inst2_1 by the second decoder 
ID1 . As the decoded result of the instruction code 
inst2_0 in the first decoder IDO, it is found that the in- 
struction inst2 is not in the instruction format having the 
shortest length. 

[ 0054 ] In a case where the shortest instruction is input 
to the first instruction decoder IDO, the head operation 
code of the next instruction is input to the second in- 
struction decoder ID1 . The second instruction decoder 
ID1 decodes the instruction, assuming such input of the 
head operation code of the next instruction. Therefore, 
in a case where the instruction decoded in the first in- 
struction decoder IDO is a non-shortest instruction, the 
instruction decode in the second instruction decoder ID1 
is judged erroneous. The judged result of the error is 
reflected in the output id0_out of the first instruction de- 
coder IDO, and invalidation processing is performed in 
the pipeline control unit PCNTin responseto this judged 
result. As shown in Fig. 1 , the decoded results, namely, 
the outputs id0_out and id1_out are sent from the first 
and second instruction decoders IDO, ID1 to the pipeline 
control unit PCNT. The output id0_out contains informa- 
tion indicating whether or not the instruction decoded in 
the first decoder IDO is in the instruction format of the 
shortest length. On the other hand, the output id1_out 
may well contain information indicating that the instruc- 
tion which the second decoder ID1 cannot decode has 
been input. In this embodiment, however, it is supposed 
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that such information is not contained in the output 
id1_out. 

[ 0055 ] The decoded result of the second instruction 
decoder ID1 must be invalidated in conformity with the 
information contained in the output id0_out as indicates 
that the length of the instruction having been input to the 
first instruction decoder IDO is not the shortest or 1 6 bits. 
The processing for the invalidation is performed by the 
pipeline control unit PCNT as stated above. 

Detailed Block Diagram of Pipeline Control Unit PCNT 

[ 0056 ] Fig. 5 shows a detailed block diagram of the 
pipeline control unit PCNT. 

[ 0057 ] The pipeline control unit PCNT is configured of 
a pipeline stage control unit Pipe_CNTL, a selector SEL 
and a no-operation command unit NOP, and it controls 
the pipeline operation of the whole microprocessor on 
the basis of the outputs id0_out, id1_out and the status- 
es of the respective units (IU, DU. EU, IOU). The 
processing stages in the pipeline processing are con- 
trolled by the pipeline stage control unit Pipe_CNTL of 
the pipeline control unit PCNT in Fig. 5. Besides, the 
invalidation processing for the output information of the 
second instruction decoder ID1 is performed on this side 
of the pipeline stage control unit Pipe_CNTL. 

[ 0058 ] More specifically, the output idl _out of the sec- 
ond instruction decoder ID1 is invalidated as follows: 
This output id1_out of the second instruction decoder 
ID1 is supplied to one input of the selector SEL. In this 
embodiment, another input of the selector SEL is sup- 
plied with a fixed value NOP through not especially re- 
stricted. The fixed value NOP has quite the same fields 
as those of the output id1_out, and affords a non-exe- 
cution command instruction called "no operation". The 
value NOP may be either identical to or different from 
the decoded information of an "nop" instruction which is 
generally employed as the instruction for commanding 
no operation. Necessary is that the instruction NOP 
commands no operation, and the size of data to be han- 
dled, for example, may be designated to any value. The 
selection of either of the value NOP and the output 
id1_out in the selector SEL is done in accordance with 
the information idl _valid which is contained in the out- 
put id0_out being the decoded result of the first instruc- 
tion decoder IDO and which indicates whether or not the 
full length of the instruction decoded by the first instruc- 
tion decoder IDO is 1 6 bits. In a case where the instruc- 
tion length is 16 bits, the output idl _out is selected. To 
the contrary, in a case where the instruction length ex- 
ceeds 16 bits, the value NOP is selected. In this way, 
pipeline control signals pcntO and pcntl are obtained. 
[ 0059 ] Let's suppose the execution of the instruction 
row in Fig. 3 again. The changes of the pipeline control 
signals pcntO and pcntl on this occasion are shown in 
Fig. 6(A). It should be noted that, unlike Fig. 3, Fig. 6(A) 
represents time in the vertical direction thereof. By way 
of example, when the statuses in Fig. 4(A) shift into the 


statuses in Fig. 4(B), the instructions instO and instl are 
decoded. This situation is indicated at the uppermost 
line in Fig. 6(A). In the next machine cycle, the instruc- 
tion codes inst2_0 and inst2_1 are decoded , and the de- 
5 coded result of the instruction code inst2_0 and the fixed 
value NOP are respectively delivered as the signals 
pcntO and pcntl. Thenceforth, the execution proceeds 
similarly, and the instructions instO thru inst6 are sub- 
jected to the decode processing in 4 machine cycles. 
io [ 0060 ] Shown in Fig. 6(B) arethechanges ofthecon- 
trol signal pcntO in the case of the prior art where 
processing similar to the above is performed using only 
the first instruction decoder IDO. In this case of the prior 
art, 7 machine cycles are required for the decode 
15 processing of the instructions instO thru inst6 as illus- 
trated in Fig. 6(B). 

[ 0061 ] Thus, in this embodiment, an instruction de- 
coding capability double higher is attained at the peak, 
and a capability equal to one attained with the single 
instruction decoder is attained even in the worst case. 
[ 0062 ] Now, the processing of the instructions instO, 
instl and inst2_0, inst2_1 will be described as to more 
practicable examples. As the examples, it is assumed 
that the instruction instO is the fixed-length instruction of 
adding the values of the registers R0 and R1 and then 
setting the result in the register R1 , that the instruction 
instl is the fixed-length instruction of adding the values 
of the registers R2 and R3 and then setting the result in 
the register R3, and that the instruction inst2 is the var- 
iable-length instruction of adding displacement data to 
the value of the register R4 to obtain an address and 
then fetching the data of the address from the main 
memory and setting it in the register R5. Here, the in- 
struction code inst2_0 is the operation code, and the in- 
struction code inst2_1 is the displacement data. 

[ 0063 ] First, the processing of the instructions instO 
and instl will be described. 

[ 0064 ] The two instructions instO and instl are deliv- 
ered to the 96-bit signal lines laid from the prefetch 
queue PFQ, in the mannershown in Fig. 4(A). Then, the 
instruction instO is decoded by the first instruction de- 
coder IDO, while the instruction instl is decoded by the 
second instruction decoder ID1 . 

[ 0065 ] In this case, it is decided as the result of the 
decoding of the instruction instO that this instruction 
instO is the shortest instruction. The result of the deci- 
sion is indicated by asserting the signal idl _valid in the 
decoded result id0_out. The outputs id0_out and 
id1_outare respectively delivered as the control signals 
pcntO and pcntl through the pipeline control unit PCNT 
described before. Subsequently, an operation to be stat- 
ed below is performed by the commands of these control 
signals. 

[0066] The value of the register R0 and that of the reg- 
ister R1 are respectively delivered to the signal lines dO 
and dl in accordance with the command of the control 
signal pcntO. Simultaneously, the value of the register 
R2 and that of the register R3 are respectively delivered 
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to the signal lines d2 and d3 in accordance with the com- of the processing of the whole microprocessor is en- 

mand of the control signal pcntl . Subsequently, the hanced, and CPI (the number of machine cycles re- 
arithmetic logic unit ALUO adds the values of the signal quired for executing one instruction) can be rendered 
lines dO and dl and delivers the sum to the signal lines less than one. 

eO, while the arithmetic logic unit ALU 1 adds the values 5 [0074] Moreover, a plurality of instruction decoders 

of the signal lines d2 and d3 and delivers the sum to the may include only one instruction decoder capable of de- 
signal lines el . Further, at the succeeding stage of reg- coding all the instruction formats. The remaining instruc- 

ister store, the value of the signal lines eO is set in the tion decoders may have merely the function of decoding 

register R1 , and the value of the signal lines el in the the shortest instruction format. Therefore, the decoding 

register R3. io of a plurality of instructions can be realized with a small 

[0067] Next, the processing operation of the instruc- quantity of hardware. This merit results also in reducing 

tion inst2 will be described. the quantities of processing fortesting and diagnosing 

[0068] The instruction inst2 is delivered to the 96-bit the microprocessor and in shortening the time periods 
signal lines laid from the prefetch queue PFQ, in the of the processing. 

manner shown in Fig. 4(B). Then, the instruction code is [0075] Besides, an instruction code to be input to the 
inst2_0 is decoded in the first instruction decoder IDO, plurality of instruction decoders are uniquely divided by 

while the instruction code inst2_1 is deocded in the sec- the length of the shortest instruction format, and the re- 
ond instruction decoder ID1 under the assumption that suiting elements are input to the respective instruction 
it is the head code of the next instruction. decoders. That is, the inputs of all the instruction decod- 

[0069] In this case, it is decided as the result of the 20 ers are selected with ease. This merit is useful for the 
decoding of the instruction code inst2_0 that the instruc- realization of a high speed together with the suppression 

tion inst2 is a non-shortest instruction. The result of the of the quantity of hardware. 

decision is indicated by negating the signal id1_valid in [0076] The embodiment of the present invention is al- 

the decoded result id0_out. The output id0_out is deliv- so applicable to a microprocessor which has a fixed- 

ered as the control signal pcntO through the pipeline 25 length instruction set. More specifically, most of the plu- 
control unit PCNT described before. Simultaneously, rality of instruction decoders are permitted to decode on- 

since the signal id1_valid is negated, the instruction ly instructions of high frequence in use. whereby the in- 
NOP commanding no operation is selected by the se- struction decoders for processing the plurality of instruc- 
lector SEL in the pipeline control unit PCNT and is de- tions in parallel, which have a small quantity of hardware 
livered as the control signal pcntl . Subsequently, an op- 20 and which operate at high speed, can be realized, 
eration to be stated below is performed by the com- [0077] In addition, irrespective of the fixed-length in- 

mands of these control signals. struction set and the variable-length instruction set, in- 

10070] The value of the register R4 is delivered to the structions which each instruction decoder is capable of 
signal lines dO in accordance with the command of the decoding can be determined in correspondence with a 
control signal pcntO. Also, the displacement data 35 circuit which the instruction decoder controls. By way of 
inst2_1 of 1 6 bits is expanded into 32 bits by the expan- example, an instruction decoder for controlling an arith- 
sion part generator EG, and the expanded data is deliv- metic logic unit is capable of decoding only arithmetic 
ered to the signal lines dl . logic instructions, and for any other instruction, it pro- 

10071] Besides, since the command of the control sig- duces a result indicating that it has failed to decode the 

nal pcntl is the value NOP, any output is not especially 40 instruction. This measure brings forth the effect that the 
delivered to the signal lines d2 and d3. Subsequently, number of signal lines to be laid from the instruction de- 
the integral arithmetic logic unit ALUO adds the values coder to the controlled circuit decreases, 
of the signal lines dO and dl (for calculating the address) [0078] The present invention makes it possible to de- 

and delivers the sum to the signal lines eO. The com- code a plurality of fixed-length instructions in parallel in 
mandforthe arithmetic logic unti ALU1 is also the value 45 a variable-length instruction set. As compared with the 
NOP, and any output is not especially delivered to the prior-art method, accordingly, the invention enhances 

signal lines el . the maximum throughput of an instruction decoding per- 

[0072] Further, at the succeeding stage, in accord- formance. 
ance with the command of the control signal pcntO, that 
address of the main memory which is specified by the so 
value of the signal lines eO is accessed to fetch an op- Claims 
erand, and the fetched data is set in the register R5. 

Since the commands of the control signal pcntl for the 1 . A microprocessor comprising: 

interface unit IOU and the instruction decode unit DU 

(register store) are the value NOP, the main memory is 55 a fetch unit (IU) which fetches at least one in- 

not accessed, and any value is not transferred or set to struction from outside of said microprocessor, 

or in any register from the signal lines el , either. and which delivers said at least one instruction 

[0073] According to this embodiment, the throughput to output lines (i0-i5), said output lines (i0-i5) 
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having a bit width that is at least double a pre- 
determined bit width; 

a first instruction decoder (IDO) whose input is 
supplied with a first output which is delivered 
from said fetch unit on one of said output lines 

(io); 

a second instruction decoder (ID1 ) whose input 
is supplied with a second output which is deliv- 
ered from said fetch unit on the other one of 
said output lines (il); 

a control unit (PCNT) which is supplied with a 
first decoded result (idO_out) of said first in- 
struction decoder (IDO) and a second decoded 
result (id1_out) of said second instruction de- 
coder (I D 1 ) ; and 

an instruction execution unit (EU) which re- 
sponds to an output from said control unit (PC- 
NT); 

characterized in that said first instruction de- 
coder decodes a predetermined set of instructions 
executed in said instruction execution unit, and said 
second instruction decoder decodes a part of said 
predetermined set of instructions; 

that said control unit (PCNT) includes a selec- 
tor (SEL). a first input and a second input of which 
are respectively supplied with said second decoded 
result (id1_out) of said second instruction decoder 
(ID 1 ) and a non-execution command instruction 
(NOP), and a control input of which is supplied with 
an information (id1_valid) indicating whether or not 
said first output is an instruction having an instruc- 
tion length of said predetermined bit width; 

that, when each of said first and second out- 
puts is an instruction having an instruction length of 
said predetermined bit width, said selector (SEL) 
transmits said second decoded result in response 
to said information, so that said instruction execu- 
tion unit (EU) executes said first output and said 
second output in parallel in response to said first 
decoded result and said second decoded result, 
and 

that, when said first output is a part of an in- 
struction having an instruction length longer than 
said predetermined bit width, said selector (SEL) 
transmits said non-execution command instruction 
(NOP) in response to said information in order to 
invalidate the second decoded result, so that said 
instruction execution unit (EU) executes said first 
output in response to said first decoded result. 

2. A microprocessor according to claim 1, wherein, 
when said control unit (PCNT) invalidates the de- 
coded result (id 1 _out) of said second instruction de- 
coder (I D 1 ), said instruction execution unit (EU) de- 
termines an address of an operand in response to 
that bit information of said output lines of said fetch 
unit (IU) which corresponds to a bit position of the 


invalidated decoded result of said second instruc- 
tion decoder. 

3. A microprocessor according to claim 1 , wherein the 
5 predetermined bit width is the shortest instruction 

length. 


Patentanspriiche 

10 

1. Mikroprozessor mit: 

einer Abrufeinheit (IU), die mindestens eine An- 
weisung von auBerhalb des Mikroprozessors 
15 abruft, und die mindestens eine Anweisung an 

Ausgabeleitungen (i0-i5) ubertragt, die eine 
Bitbreite aufweisen, die mindestens das Dop- 
pelte einer vorgegebenen Bitbreite betragt; 
einem ersten Anweisungsdekoder (IDO), des- 
20 sen Eingabe von einer ersten, von der Ab- 

rufeinheit auf eine der Ausgabelinien (iO) uber- 
tragenen Ausgabe geliefert wird; 
einem zweiten Anweisungsdekoder (ID1), des- 
sen Eingabe von einer zweiten, von der Ab- 
25 rufeinheit auf eine andere der Ausgabelinien 

(il ) ubertragenen Ausgabe geliefert wird, 
einer Steuereinheit (PCNT), der ein erstes de- 
kodiertes Ergebnis (id0_out) des ersten Anwei- 
sungsdekoders (IDO) und ein zweites dekodier- 
30 tes Ergebnis (id1_out) des zweiten Anwei- 

sungsdekoders (ID 1 ) geliefert wird; und 
einer Anweisungsausfuhreinheit (EU), die auf 
die Ausgabe der Steuereinheit (PCNT) rea- 
giert; 

35 

dadurch gekennzeichnet, 

daB der erste Anweisungsdekoder einen vor- 
gegebenen, in der Anweisungsausfuhreinheit aus- 
zufuhrenden Anweisungssatz dekodiert, und der 
40 zweite Anweisungsdekoder einen Teil des vorgege- 
benen Anweisungssatz es dekodiert; 

daf3 die Steuereinheit (PCNT) eine Auswahl- 
vorrichtung (SEL) umfasst, dessen erste und zweite 
Eingabe von jeweils dem zweiten dekodierten Er- 
45 gebnis (id1_out des zweiten Anweisungsdekoders 
(I D 1 ) und einer Nulloperations-Befehlsanweisung 
(NOP) geliefert werden, und dessen Steuereingabe 
von einer Information (id1_valid geliefert wird, die 
anzeigt, ob Oder ob nicht die erste Ausgabe eine 
50 Anweisungsbreite von der vorgegebenen Bitbreite 
aufweist; 

daf3, wenn jeweils die erste und zweite Aus- 
gabe eine Anweisung mit einer Anweisungslange 
von der vorgegebenen Bitbreite ist, die Auswahlvor- 
55 richtung (SEL) als Reaktion auf die genannte Infor- 
mation das zweite dekodierte Ergebnis ubertragt, 
so daB die Anweisungsausfuhreinheit (EU) als Re- 
aktion auf das erste und das zweite dekodierte Er- 


9 



17 


EP 0 467 152 B1 


18 


gebnis die erste und zweite Ausgabe parallel aus- 
fiihrt, und 

daf3, wenn die erste Ausgabe Teil einer An- 
weisung mit einer Anweisungslange ist, die langer 
als die vorbestimmte Bitbreite ist. die Auswahlvor- 
richtung (SEL) als Reaktion auf die genannte Infor- 
mation die Nulloperations-Befehlsanweisung 
(NOP) ubertragt, um das zweite dekodierte Ergeb- 
nis ungultigzu machen, sodaB die Anweisungsaus- 
fuhreinheit (EU) als Reaktion auf das erste deko- 
dierte Ergebnis die erste Ausgabe ausfuhrt. 

2. Mikroprozessor nach Anspruch 1 , wobei, wenn die 
Steuereinheit (PONT) das dekodierte Ergebnis 
(id1_out) des zweiten Anweisungsdekoders (ID1) 
ungultig macht, die Anweisungsausfuhreinheit (EU) 
als Reaktion auf diejenige B ^information der Aus- 
gabeleitungen der Abrufeinheit (IU), die einer Bit- 
position des ungultig gemachten dekodierten Er- 
gebnisses des zweiten Anweisungsdekoders ent- 
spricht, eine Operandenadresse feststellt. 

3. Mikroprozessor nach Anspruch 1 , wobei die vorge- 
gebene Bitbreite die kurzeste Anweisungslange ist. 


Revendications 

1. Microprocesseur comportant : 

une unite de lecture (IU) qui lit au moins une 
instruction depuis I'exterieur dudit processeur, 
et qui delivre ladite au moins une instruction a 
des lignes de sortie (iO a i5), lesdites lignes de 
sortie (iO a i5) ayant une largeur binaire qui est 
au moins le double d'une largeur binaire prede- 
termine, 

un premier decodeur destructions (IDO) dont 
I'entree est alimentee par une premiere sortie 
qui est delivree par ladite unite de lecture sur 
I'une desdites lignes de sortie (iO), 
un second decodeur destructions (ID1) dont 
I'entree est alimentee par une seconde sortie 
qui est delivree par ladite unite de lecture sur 
I'autre ligne parmi lesdites lignes de sortie (il ), 
une unite decommande(PCNT) qui est alimen- 
tee par un premier resultat decode (idO_out) 
dudit premier decodeur destructions (IDO) et 
un second resultat decode (id1_out) dudit se- 
cond decodeur destructions (ID1), et 
une unite d'execution destructions (EU) qui re- 
pond a une sortie provenant de ladite unite de 
commande (PCNT), 

caracterise en ce que ledit premier decodeur 
destructions decode un jeu predetermine des- 
tructions execute dans ladite unite d'execution 
destructions, et ledit second decodeur destruc- 


tions decode une partie dudit jeu predetermine 
destructions, 

en ce que ladite unite de commande (PCNT) 
inclut un selecteur (SEL), une premiere entree et 
5 une seconde entree qui sont respectivement ali- 

mentees par ledit second resultat decode (id1_out) 
dudit second decodeur destructions (ID1) et une 
instruction de commande ineffective (NOP), et une 
entree de commande qui est alimentee par des in- 
io formations (id1_valid) indiquant si oui ou non ladite 
premiere sortie est une instruction ayant une lon- 
gueur destruction egale a ladite largeur binaire 
predetermine, 

en ce que, lorsquechacune desdites premie- 
15 re et seconde sorties est une instruction ayant une 
longueur destruction egale a ladite largeur binaire 
predeterminee, ledit selecteur (SEL) transmet ledit 
second resultat decode en reponse auxdites infor- 
mations, de sorte que ladite unite d'execution d'ins- 
20 tructions (EU) execute ladite premiere sortie et la- 
dite seconde sortie en parallele en reponse audit 
premier resultat decode et audit second resultat de- 
code, et 

en ce que, lorsque ladite premiere sortie est 
25 une partie d'une instruction ayant une longueur 
d'instruction plus longue que ladite largeur binaire 
predeterminee, ledit selecteur (SEL) transmet ladi- 
te instruction de commande ineffective (NOP) en re- 
ponse auxdites informations afin d'invalider le se- 
30 cond resultat decode, de telle sorte que ladite unite 
d'execution d'instructions (EU) execute ladite pre- 
miere sortie en reponse audit premier resultat de- 
code. 

35 2. Microprocesseur selon la revendication 1 , dans le- 

quel, lorsque ladite unite de commande (PCNT) in- 
valide le resultat decode (id1_out) dudit second de- 
codeur d'instructions (ID1), ladite unite d'execution 
d'instructions (EU) determine une adresse d'un 
40 operande en reponse a ces informations binaires 
desdites lignes de sortie de ladite unite de lecture 
(IU) qui correspond a une position binaire du resul- 
tat decode invalide dudit second decodeur d'ins- 
tructions. 

45 

3. Microprocesseur selon la revendication 1 , dans le- 
quel la largeur binaire predeterminee est la plus 
courte longueur d'instruction. 

50 
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